LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
نویسندگان
چکیده
MOTIVATION Long non-coding RNAs (lncRNAs), which are non-coding RNAs of length above 200 nucleotides, play important biological functions such as gene expression regulation. To fully reveal the functions of lncRNAs, a fundamental step is to annotate them in various species. However, as lncRNAs tend to encode one or multiple open reading frames, it is not trivial to distinguish these long non-coding transcripts from protein-coding genes in transcriptomic data. RESULTS In this work, we design a new tool that calculates the coding potential of a transcript using a machine learning model (random forest) based on multiple features including sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families. The experimental results show that our tool competes favorably with existing coding potential computation tools in lncRNA identification. AVAILABILITY AND IMPLEMENTATION The scripts and data can be downloaded at https://github.com/zhangy72/LncRNA-ID.
منابع مشابه
Induction of apoptosis and necrosis in human acute erythroleukemia cells by inhibition of long non-coding RNA PVT1
Recent advances in molecular medicine have proposed new therapeutic strategies for cancer. One of the molecular research lines for the diagnosis and treatment of cancer is the use of long non-coding RNAs (LncRNAs) which are a class of non-coding RNA molecules longer than 200 base pairs in length that act as the key regulator of gene expression. Different aspects of cellular activities like cell...
متن کاملNONCODEv4: exploring the world of long non-coding RNA genes
NONCODE (http://www.bioinfo.org/noncode/) is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs). Non-coding RNAs (ncRNAs) have been implied in diseases and identified to play important roles in various biological processes. Since NONCODE version 3.0 was released 2 years ago, discovery of novel ncRNAs has been promoted by high-throughput RNA sequencing (RNA...
متن کاملIdentification and Functional Prediction of Long Non-Coding RNAs Responsive to Drought stress in Lens culinaris L.
Drought stress is one of the main environmental factors that affects growth and productivity of crop plants, including lentil. In the course of evolution evolution, crucial genetic regulations mediated by non-coding RNAs (ncRNAs) have emerged in plant in response to drought and other abiotic stresses. In the present study, after identifying lncRNAs within the expression profile of lentil, RNA-s...
متن کاملIdentification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensi...
متن کاملDiagnostic Value of Plasma Long Non-coding RNA HOTTIP as a Non-invasive Biomarker for Colorectal Cancer ( A Case- Control Study)
Long non-coding RNAs (lncRNAs) associated with various cancers, including colorectal cancer (CRC), could be collected from body fluids easily. Our aims were to determine the expression level of HOTTIP lncRNA in plasma samples of healthy individuals and CRC patients as well as their relationship with clinico-pathological characteristics of patients. First, total RNA was extracted from the plasma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 31 24 شماره
صفحات -
تاریخ انتشار 2015